A bottom-up OCR system for mathematical formulas recognition
Identifieur interne : 001190 ( Main/Exploration ); précédent : 001189; suivant : 001191A bottom-up OCR system for mathematical formulas recognition
Auteurs : WEI WU [République populaire de Chine] ; FENG LI [République populaire de Chine] ; JUN KONG [République populaire de Chine] ; LICHANG HOU [République populaire de Chine] ; BINGDUI ZHU [République populaire de Chine]Source :
- Lecture notes in computer science [ 0302-9743 ] ; 2006.
Descripteurs français
- Pascal (Inist)
- Intelligence artificielle, Reconnaissance caractère, Reconnaissance optique caractère, Image binaire, Reconnaissance forme, Traitement image, Segmentation image, Autoorganisation, Emploi, Formule mathématique, Caractère imprimé, Document imprimé, Grammaire, Latex, Méthode ascendante, Réseau neuronal, Editeur texte.
- Wicri :
- topic : Intelligence artificielle.
English descriptors
- KwdEn :
- Artificial intelligence, Binary image, Bottom up method, Character recognition, Employment, Grammar, Image processing, Image segmentation, Latex, Mathematical formula, Neural network, Optical character recognition, Pattern recognition, Printed character, Printed document, Self organization, Text editor.
Abstract
An OCR system is presented to understand mathematical formulas in binary printed document images. The system utilizes a novel component-labeling algorithm for extracting local maximum components from image, and uses these components to locate the mathematical formulas. A character recognition algorithm based on neural networks is then adopted. For segmenting merged characters in the image, a novel segmentation algorithm based on a modified SOM neural network was introduced into the system. With the employment of LL(1) grammar, this system can convert the recognition results into a LATEX file.
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000314
- to stream PascalFrancis, to step Curation: 000472
- to stream PascalFrancis, to step Checkpoint: 000363
- to stream Main, to step Merge: 001224
- to stream Main, to step Curation: 001190
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">A bottom-up OCR system for mathematical formulas recognition</title>
<author><name sortKey="Wei Wu" sort="Wei Wu" uniqKey="Wei Wu" last="Wei Wu">WEI WU</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Dept. Appl. Math., Dalian University of Technology</s1>
<s2>Dalian 116024</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Dalian 116024</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Feng Li" sort="Feng Li" uniqKey="Feng Li" last="Feng Li">FENG LI</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Dept. Appl. Math., Dalian University of Technology</s1>
<s2>Dalian 116024</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Dalian 116024</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Jun Kong" sort="Jun Kong" uniqKey="Jun Kong" last="Jun Kong">JUN KONG</name>
<affiliation wicri:level="1"><inist:fA14 i1="02"><s1>Northeast Normal University</s1>
<s2>Changchun 130024</s2>
<s3>CHN</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<placeName><settlement type="city">Changchun</settlement>
<region type="province">Jilin</region>
<region type="groupement">Dongbei</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Lichang Hou" sort="Lichang Hou" uniqKey="Lichang Hou" last="Lichang Hou">LICHANG HOU</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Dept. Appl. Math., Dalian University of Technology</s1>
<s2>Dalian 116024</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Dalian 116024</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Bingdui Zhu" sort="Bingdui Zhu" uniqKey="Bingdui Zhu" last="Bingdui Zhu">BINGDUI ZHU</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Dept. Appl. Math., Dalian University of Technology</s1>
<s2>Dalian 116024</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Dalian 116024</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">07-0525298</idno>
<date when="2006">2006</date>
<idno type="stanalyst">PASCAL 07-0525298 INIST</idno>
<idno type="RBID">Pascal:07-0525298</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000314</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000472</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000363</idno>
<idno type="wicri:doubleKey">0302-9743:2006:Wei Wu:a:bottom:up</idno>
<idno type="wicri:Area/Main/Merge">001224</idno>
<idno type="wicri:Area/Main/Curation">001190</idno>
<idno type="wicri:Area/Main/Exploration">001190</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">A bottom-up OCR system for mathematical formulas recognition</title>
<author><name sortKey="Wei Wu" sort="Wei Wu" uniqKey="Wei Wu" last="Wei Wu">WEI WU</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Dept. Appl. Math., Dalian University of Technology</s1>
<s2>Dalian 116024</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Dalian 116024</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Feng Li" sort="Feng Li" uniqKey="Feng Li" last="Feng Li">FENG LI</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Dept. Appl. Math., Dalian University of Technology</s1>
<s2>Dalian 116024</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Dalian 116024</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Jun Kong" sort="Jun Kong" uniqKey="Jun Kong" last="Jun Kong">JUN KONG</name>
<affiliation wicri:level="1"><inist:fA14 i1="02"><s1>Northeast Normal University</s1>
<s2>Changchun 130024</s2>
<s3>CHN</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<placeName><settlement type="city">Changchun</settlement>
<region type="province">Jilin</region>
<region type="groupement">Dongbei</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Lichang Hou" sort="Lichang Hou" uniqKey="Lichang Hou" last="Lichang Hou">LICHANG HOU</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Dept. Appl. Math., Dalian University of Technology</s1>
<s2>Dalian 116024</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Dalian 116024</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Bingdui Zhu" sort="Bingdui Zhu" uniqKey="Bingdui Zhu" last="Bingdui Zhu">BINGDUI ZHU</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Dept. Appl. Math., Dalian University of Technology</s1>
<s2>Dalian 116024</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Dalian 116024</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
<imprint><date when="2006">2006</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Lecture notes in computer science</title>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Artificial intelligence</term>
<term>Binary image</term>
<term>Bottom up method</term>
<term>Character recognition</term>
<term>Employment</term>
<term>Grammar</term>
<term>Image processing</term>
<term>Image segmentation</term>
<term>Latex</term>
<term>Mathematical formula</term>
<term>Neural network</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Printed character</term>
<term>Printed document</term>
<term>Self organization</term>
<term>Text editor</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Intelligence artificielle</term>
<term>Reconnaissance caractère</term>
<term>Reconnaissance optique caractère</term>
<term>Image binaire</term>
<term>Reconnaissance forme</term>
<term>Traitement image</term>
<term>Segmentation image</term>
<term>Autoorganisation</term>
<term>Emploi</term>
<term>Formule mathématique</term>
<term>Caractère imprimé</term>
<term>Document imprimé</term>
<term>Grammaire</term>
<term>Latex</term>
<term>Méthode ascendante</term>
<term>Réseau neuronal</term>
<term>Editeur texte</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Intelligence artificielle</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">An OCR system is presented to understand mathematical formulas in binary printed document images. The system utilizes a novel component-labeling algorithm for extracting local maximum components from image, and uses these components to locate the mathematical formulas. A character recognition algorithm based on neural networks is then adopted. For segmenting merged characters in the image, a novel segmentation algorithm based on a modified SOM neural network was introduced into the system. With the employment of LL(1) grammar, this system can convert the recognition results into a LATEX file.</div>
</front>
</TEI>
<affiliations><list><country><li>République populaire de Chine</li>
</country>
<region><li>Dongbei</li>
<li>Jilin</li>
</region>
<settlement><li>Changchun</li>
</settlement>
</list>
<tree><country name="République populaire de Chine"><noRegion><name sortKey="Wei Wu" sort="Wei Wu" uniqKey="Wei Wu" last="Wei Wu">WEI WU</name>
</noRegion>
<name sortKey="Bingdui Zhu" sort="Bingdui Zhu" uniqKey="Bingdui Zhu" last="Bingdui Zhu">BINGDUI ZHU</name>
<name sortKey="Feng Li" sort="Feng Li" uniqKey="Feng Li" last="Feng Li">FENG LI</name>
<name sortKey="Jun Kong" sort="Jun Kong" uniqKey="Jun Kong" last="Jun Kong">JUN KONG</name>
<name sortKey="Lichang Hou" sort="Lichang Hou" uniqKey="Lichang Hou" last="Lichang Hou">LICHANG HOU</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001190 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001190 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= Pascal:07-0525298 |texte= A bottom-up OCR system for mathematical formulas recognition }}
This area was generated with Dilib version V0.6.32. |